AITopics | training step

Collaborating Authors

training step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

Gong, Shijin, Xu, Erhan, Ye, Kai, Quinzan, Francesco, Livieri, Giulia, Shi, Chengchun

arXiv.org Machine LearningMay-27-2026

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training algorithm designed to address this tradeoff. At each online training step, BASIS samples only one rollout per prompt, but leverages rich information across prompts in the entire batch to improve value function estimation. Our experiments demonstrate that BASIS reduces MSE in value function estimation by 69% compared to REINFORCE++, a representative single-rollout baseline, and achieves lower MSE with one rollout than group mean estimators with 8 rollouts. This improvement in value estimation translates to better policy optimization: using substantially less training time, BASIS achieves performance close to multi-rollout GRPO-type baselines and often outperforms single-rollout REINFORCE-type baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.27293

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

Gong, Shijin, Ye, Kai, Zhu, Jin, Zhang, Xinyu, Zhou, Hongyi, Shi, Chengchun

arXiv.org Machine LearningMay-19-2026

Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three types of approaches have been widely adopted: The first relies on a deep neural network to estimate the value function of the learning policy in order to reduce the variance of the policy gradient. However, estimating and maintaining such a value network incurs substantial computational and memory overhead. The second avoids training a value network by approximating the value function using sample averages. However, it samples a large number of reasoning traces per prompt for accurate value function approximation, making it computationally expensive. The third samples only a single reasoning trajectory per prompt, which reduces computational cost but suffers from poor sample efficiency. This paper focuses on a practical, resource-constrained setting in which only a small number of reasoning traces can be sampled per prompt, while low-variance gradient estimation remains essential for high-quality policy learning. To address this challenge, we bring classical nonparametric statistical methods, which are both computationally and statistically efficient, to LLM reasoning. We employ kernel smoothing as a concrete example for value function estimation and the subsequent policy optimization. Numerical and theoretical results demonstrate that our proposal achieves accurate value and gradient estimation, leading to improved policy optimization.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2604.28005

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Appendix ANetwork Architectures

Neural Information Processing SystemsApr-25-2026, 21:58:26 GMT

In this section, we describe the details of the network architectures used in Sec. 4 and 5. We mainly used 4 GPUs (NVIDIAV100; 16GB) for the experiments in Sec. 4 and 5 and it took about 4 hours per seed (in the case of 3M steps). Actually, we conducted exhaustive evaluations through the enormous experiments, and we hope our empirical observations and recommendations help the practitioners to explore the explosive configuration space. Adam Adam Learning rate (policy) 1e-4 5e-5 3e-4 3e-4 Learning rate (value) 1e-4 1e-2 3e-4 3e-4 Weight initialization Uniform Xavier Uniform Xavier Uniform Xavier Uniform Initial output scale (policy) 1.0 1e-4 1e-2 1e-2 Target update Hard - Soft (5e-3) Soft (5e-3) Clipped Double QFalse - True True Table 7: Details of each network architecture. We refer the original implementations of each algorithm which is available online [23, 14, 48, 27, 42].

artificial intelligence, machine learning, training step, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

11e1900e680f5fe1893a8e27362dbe2c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:17 GMT

antmaze task, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendix for " Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks "

Neural Information Processing SystemsApr-24-2026, 17:29:03 GMT

Since generated samples resemble the images with low discrepancy scores, emphasizing high-scoring images can boost the diversity in sample generation.

artificial intelligence, discrepancy score, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

0cddb7c06f1cd518e1efdc0e20b70c31-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 16:10:14 GMT

artificial intelligence, machine learning, meshgraphnet, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Two-Time-Scale Learning Dynamics: A Population View of Neural Network Training

Borghi, Giacomo, Im, Hyesung, Pareschi, Lorenzo

arXiv.org Machine LearningMar-26-2026

Population-based learning paradigms, including evolutionary strategies, Population-Based Training (PBT), and recent model-merging methods, combine fast within-model optimisation with slower population-level adaptation. Despite their empirical success, a general mathematical description of the resulting collective training dynamics remains incomplete. We introduce a theoretical framework for neural network training based on two-time-scale population dynamics. We model a population of neural networks as an interacting agent system in which network parameters evolve through fast noisy gradient updates of SGD/Langevin type, while hyperparameters evolve through slower selection--mutation dynamics. We prove the large-population limit for the joint distribution of parameters and hyperparameters and, under strong time-scale separation, derive a selection--mutation equation for the hyperparameter density. For each fixed hyperparameter, the fast parameter dynamics relaxes to a Boltzmann--Gibbs measure, inducing an effective fitness for the slow evolution. The averaged dynamics connects population-based learning with bilevel optimisation and classical replicator--mutator models, yields conditions under which the population mean moves toward the fittest hyperparameter, and clarifies the role of noise and diversity in balancing optimisation and exploration. Numerical experiments illustrate both the large-population regime and the reduced two-time-scale dynamics, and indicate that access to the effective fitness, either in closed form or through population-level estimation, can improve population-level updates.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Machine Learning

2603.19808

Country: